.. code:: ipython3 import pandas as pd from seeq import spy # Set the compatibility option so that you maximize the chance that SPy will remain compatible with your notebook/script spy.options.compatibility = 192 Parameterized Jobs ================== The simple scheduling methods described in :doc:`spy.jobs <../spy.jobs>` will often be adequate for your purposes. But in some scenarios, you may wish to run a suite of jobs across an asset group or some other set of items. For this you will use the ``spy.jobs.push()`` command. **This feature is only available for scheduling notebooks in Seeq Data Lab. You cannot use SPy to schedule content in Anaconda, AWS SageMaker, or any other Python environment.** Assemble a DataFrame with the Parameters ---------------------------------------- Let’s take the most common example, which is to schedule a series of jobs across a group of assets. Search for the assets: .. code:: ipython3 schedule_df = spy.search({ 'Path': 'Example >> Cooling Tower 1', 'Type': 'Asset' }) schedule_df Now add a ``Schedule`` column, which will dictate how often the script will run. For intervals more frequent than 1 hour, it is highly recommended that you use intervals for which an hour is cleanly divisible like ‘15 minutes’, ‘20 minutes’ or ‘30 minutes’. .. code:: ipython3 schedule_df['Schedule'] = 'every 6 hours' schedule_df You can also use Quartz Cron expressions in place of the natural language phrasing above by using the `Online Cron Expression Generator `__. As an example, the equivalent Quartz Cron expression for “every 6 hours” is ``0 0 0/6 ? * * *``. Sort your Schedule DataFrame ---------------------------- It’s important to sort the DataFrame so that the ordering of the items is not dependent on how the constituent data happened to be returned by Seeq or any other data source. .. code:: ipython3 # If you have an ID column, it's easiest to sort by that. Otherwise pick something that # will result in consistent ordering schedule_df.sort_values('ID', inplace=True, ignore_index=True) Push the jobs to Seeq --------------------- The final step is to push the schedule DataFrame to Seeq so that it can schedule the jobs. It’s often desirable to “spread out” the execution of the jobs so that they don’t all execute simultaneously. In this example, we’re executing the jobs every 6 hours and we’ve asked ``spy.jobs.push()`` to spread them out evenly over those 6 hours. (In general, the ``spread`` parameter is the same as the frequency of your schedule since you want all the jobs to execute within the time interval allocated.) Execute the following cell (only) to schedule the set of jobs. .. code:: ipython3 parameters = spy.jobs.push(schedule_df, spread='6 hours', interactive_index=1) If you are a Seeq administrator, you can view these jobs by going to the *Administration* page and clicking on the *Jobs* tab. You will need to clear the *Groups* filter to see the Notebook jobs. In the output of the cell above, you’ll notice that the current context is **INTERACTIVE**, which is the term we use for the scenario where you are executing cells in the workbook yourself via the Seeq Data Lab user interface. When you open an HTML file in the ``_Job Results`` folder, you’ll see that the same cell shows the current context as **JOB**. In the JOB context, ``parameters`` will be the row of the DataFrame that pertains to that job instance. In the INTERACTIVE context, ``parameters`` will be the row that corresponds to ``interactive_index``. **We unschedule the jobs here so that your Seeq Data Lab isn’t loaded down with executing this tutorial.** .. code:: ipython3 spy.jobs.unschedule() Do something cool ----------------- Now, based on the parameters in ``parameters``, you can do something interesting. In this example we’ll push a condition to a new (small) asset tree. .. code:: ipython3 parameters Let’s pretend that we have a spiffy algorithm that can determine the health of our asset by looking at a couple of signals. .. code:: ipython3 health_data_df = spy.pull(spy.search({ 'Asset': parameters['ID'], 'Name': 'Temperature' }), header='Name') health_indicator = health_data_df.mean()['Temperature'] health_status = 'HEALTHY' if health_indicator > 80 else 'UNHEALTHY' .. code:: ipython3 metadata_df = pd.DataFrame([{ 'Path': 'Parameterized Jobs Tutorial', 'Asset': f'{parameters["Name"]}', 'Name': 'Job Executions', 'Type': 'Condition', 'Maximum Duration': '1h' }]) metadata_df .. code:: ipython3 import datetime start = datetime.datetime.now().isoformat() end = (datetime.datetime.now() + datetime.timedelta(minutes=5)).isoformat() capsule_data = pd.DataFrame([{ 'Capsule Start': pd.to_datetime(start), 'Capsule End': pd.to_datetime(end), 'Health': health_status }]) capsule_data .. code:: ipython3 spy.push(capsule_data, metadata=metadata_df) Scheduling from a separate notebook ----------------------------------- The ``spy.jobs.push()`` function accepts a ``datalab_notebook_url`` parameter, so that a job can be pushed to another notebook to which you have access. A common use case for this would be to enable a user of an Add-on Mode notebook to configure a scheduled notebook through form input. In such a scenario, the parameters specified by completion of the form would need to be passed to the scheduled notebook. .. code:: ipython3 path_to_here = '/notebooks/SPy%20Documentation/Advanced%20Scheduling/Parameterized%20Jobs.ipynb' this_notebook_url = f'{spy.utils.get_data_lab_project_url()}{path_to_here}' spy.jobs.push(schedule_df, spread='6 hours', datalab_notebook_url=this_notebook_url) No additional work is needed to ensure the parameters are available in the target Notebook. The ``schedule_df`` used in the call to ``spy.jobs.push()`` is automatically pickled to a .pkl file in the ``_Job DataFrames`` folder of the Notebook being scheduled. To retrieve the parameters for a specific job in the jobs DataFrame from the scheduled Notebook, just call ``spy.jobs.pull()``: .. code:: ipython3 parameters = spy.jobs.pull(interactive_index=1) parameters The **JOB** and **INTERACTIVE** contexts still apply as described earlier in this tutorial. Use the ``interactive_index`` to control which row is returned by ``spy.jobs.pull()`` in the interactive context. The ``push`` and ``pull`` methods can both be used with an additional ``label`` argument, which is useful for enabling reuse of a single Notebook with different parameters. For example, if it is desired to have one schedule per user for a given notebook, the user’s ID could be used as a label. This will ensure that two distinct users can schedule the same notebook, possibly with distinct parameters created from a separate notebook or from another application, without unscheduling the other user’s jobs. Another use for a label would be enabling the scheduling of a single notebook from different Workbench Analyses using an Add-on Tool. In this case, a convenient label would be an encoding of the Workbook and Worksheet IDs of the origin worksheet, e.g., ``workbookId=77953A64-0675-47AE-826F-DEE1FD7AB4C5&worksheetId=5C83DF79-D725-4756-BBE6-4D2D1525D4FF``.